Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function.

نویسندگان

  • Zheng Rong Yang
  • Rebecca Thomson
  • T Charles Hodgman
  • Jonathan Dry
  • Austin K Doyle
  • Ajit Narayanan
  • XiKun Wu
چکیده

This paper presents an algorithm which is able to extract discriminant rules from oligopeptides for protease proteolytic cleavage activity prediction. The algorithm is developed using genetic programming. Three important components in the algorithm are a min-max scoring function, the reverse Polish notation (RPN) and the use of minimum description length. The min-max scoring function is developed using amino acid similarity matrices for measuring the similarity between an oligopeptide and a rule, which is a complex algebraic equation of amino acids rather than a simple pattern sequence. The Fisher ratio is then calculated on the scoring values using the class label associated with the oligopeptides. The discriminant ability of each rule can therefore be evaluated. The use of RPN makes the evolutionary operations simpler and therefore reduces the computational cost. To prevent overfitting, the concept of minimum description length is used to penalize over-complicated rules. A fitness function is therefore composed of the Fisher ratio and the use of minimum description length for an efficient evolutionary process. In the application to four protease datasets (Trypsin, Factor Xa, Hepatitis C Virus and HIV protease cleavage site prediction), our algorithm is superior to C5, a conventional method for deriving decision trees.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining HIV protease cleavage data using genetic programming with a sum-product function

MOTIVATION In order to design effective HIV inhibitors, studying and understanding the mechanism of HIV protease cleavage specification is critical. Various methods have been developed to explore the specificity of HIV protease cleavage activity. However, success in both extracting discriminant rules and maintaining high prediction accuracy is still challenging. The earlier study had employed g...

متن کامل

Molecular detection of proteolytic activity of human parechovirus 2A protein by gene expression

  Parechoviruses form one of the nine genera in the picornaviridae family, and include two human pathogens: Human parechovirus type1 and 2 (Hpev1 and Hpev2). The genome of picornaviruses encodes a single polyprotein, which undergoes a cleavage cascade performed by virus encoded proteases to give the final virus proteins. The primary cleavage occurs by 2A protein and this step is critical for vi...

متن کامل

Purification, Characterization and Thermodynamic Assessment of an Alkaline Protease by Geotrichum Candidum of Dairy Origin

Background: Alkaline proteases is the important group of enzymes having numerous industrial applications including dairy food formulations. Objectives: The current study deals with the purification and characterization of an alkaline serine protease produced by Geotrichum candidum QAUGC01, isolated from indigenous fermented milk product, Dahi.<br...

متن کامل

Purification and Characterization of 50 kDa Extracellular Metalloprotease from Serratia sp. ZF03

Background: Proteolytic enzymes have an important role in variety of physiological and pathological functions. They have been used in therapeutic and pharmaceutical applications. Characterizations of extracellular proteases from various strains of S. marcescens indicate that most strains produce a very similar major metalloprotease. This metalloprotease (serrapeptidase, serrapeptase) is an impo...

متن کامل

Posynomial geometric programming problem subject to max–product fuzzy relation equations

In this article, we study a class of posynomial geometric programming problem (PGPF), with the purpose of minimizing a posynomial subject to fuzzy relational equations with max–product composition. With the help of auxiliary variables, it is converted convert the PGPF into an equivalent programming problem whose objective function is a non-decreasing function with an auxiliary variable. Some pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bio Systems

دوره 72 1-2  شماره 

صفحات  -

تاریخ انتشار 2003